Note
Please install these on your system now.
# for all kinds of things
install.packages("tidyverse")
# for plotting
install.packages("ggforce")
install.packages("sf")
install.packages("lattice")
install.packages("viridisLite")
install.packages("rnaturalearth")
install.packages("rnaturalearthdata")
# for population genetics
install.packages("adegenet")
install.packages("pegas")
install.packages("poppr")
install.packages("hierfstat")Set RStudio options.
Set RStudio options.
Save your script in a file.
We’ll talk about RMarkdown later
Use comments to understand your code better.
Functions (generally) take an input and return an output in R.
For example, the function sum() takes a numeric vector and will return a single value.
Note
c() is itself a function creating a numeric vector.
We’ll discuss more what a numeric vector is soon.
Trouble-shoot using ? or help().
You can learn what package the function is from, what the function does and what arguments it takes.
Warning
Different packages might have functions with the same name. 🤯
If a function doesn’t work it will display an error…
…but these error messages aren’t always easy to understand.
Basic kinds of R objects (or ‘classes’):
Use str() if you’re unsure!
Assign an object with <- (or ->).
\(\geq\) 1 values of the same type.
\(\geq\) 1 values of the same type.
Vectors have 1 dimension (a length).
Select particular values using ‘indexing’ with [].
Watch out for missing data.
# This numeric vector has some unusual values
missing_data <- c(NULL, 1.1, 0.2, NA, 7, NaN, Inf)
# NULL: Empty
# NA: Missing data (can be any type)
# NaN: Not a number (specific to numeric)
# Inf: Infinity
str(missing_data) num [1:6] 1.1 0.2 NA 7 NaN ...
\(\geq\) 1 value(s) of the same type with two dimensions.
\(\geq\) 1 value(s) of the same type with two dimensions.
\(\geq\) 1 value(s) of the same type with two dimensions.
Index a matrix with [].
Note
Remember, now we have two dimensions.
So we index with [ROW, COLUMN].
Contains any number of items.
Each item can be a different type.
Contains any number of items.
Each item can be a different type.
A list has one dimension (length: the number of items in the list).
We can index a list with [] and [[]].
Warning
They have slightly different meanings!
When list-elements are named, they can be accessed using either [[]] or $.
## Create a named list
my_named_list <- list(first = c(1, 2, 3),
second = c("A", "B", "C"),
third = c(3, 4, 5))
str(my_named_list)List of 3
$ first : num [1:3] 1 2 3
$ second: chr [1:3] "A" "B" "C"
$ third : num [1:3] 3 4 5
Note
Many advanced functions will store their output as a list object.
Remember, you can use str() to understand them better.
Note
Many advanced functions will store their output as a list object.
Remember, you can use str() to understand them better.
List of 12
$ coefficients : Named num [1:2] 6.526 -0.223
..- attr(*, "names")= chr [1:2] "(Intercept)" "Sepal.Width"
$ residuals : Named num [1:150] -0.644 -0.956 -1.111 -1.234 -0.722 ...
..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
$ effects : Named num [1:150] -71.566 -1.188 -1.081 -1.187 -0.759 ...
..- attr(*, "names")= chr [1:150] "(Intercept)" "Sepal.Width" "" "" ...
$ rank : int 2
$ fitted.values: Named num [1:150] 5.74 5.86 5.81 5.83 5.72 ...
..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
$ assign : int [1:2] 0 1
$ qr :List of 5
..$ qr : num [1:150, 1:2] -12.2474 0.0816 0.0816 0.0816 0.0816 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:150] "1" "2" "3" "4" ...
.. .. ..$ : chr [1:2] "(Intercept)" "Sepal.Width"
.. ..- attr(*, "assign")= int [1:2] 0 1
..$ qraux: num [1:2] 1.08 1.02
..$ pivot: int [1:2] 1 2
..$ tol : num 1e-07
..$ rank : int 2
..- attr(*, "class")= chr "qr"
$ df.residual : int 148
$ xlevels : Named list()
$ call : language lm(formula = Sepal.Length ~ Sepal.Width, data = iris)
$ terms :Classes 'terms', 'formula' language Sepal.Length ~ Sepal.Width
.. ..- attr(*, "variables")= language list(Sepal.Length, Sepal.Width)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "Sepal.Length" "Sepal.Width"
.. .. .. ..$ : chr "Sepal.Width"
.. ..- attr(*, "term.labels")= chr "Sepal.Width"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(Sepal.Length, Sepal.Width)
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. ..- attr(*, "names")= chr [1:2] "Sepal.Length" "Sepal.Width"
$ model :'data.frame': 150 obs. of 2 variables:
..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
..- attr(*, "terms")=Classes 'terms', 'formula' language Sepal.Length ~ Sepal.Width
.. .. ..- attr(*, "variables")= language list(Sepal.Length, Sepal.Width)
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "Sepal.Length" "Sepal.Width"
.. .. .. .. ..$ : chr "Sepal.Width"
.. .. ..- attr(*, "term.labels")= chr "Sepal.Width"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. ..- attr(*, "predvars")= language list(Sepal.Length, Sepal.Width)
.. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. ..- attr(*, "names")= chr [1:2] "Sepal.Length" "Sepal.Width"
- attr(*, "class")= chr "lm"
A special type of list:
A special type of list:
A data frame has two dimensions (number of rows and number of columns).
We can index using [] (just like a matrix) or using column names.
We usually create a data frame by reading in a .csv file!
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : chr "setosa" "setosa" "setosa" "setosa" ...
Use functions head(), tail(), or summary() to investigate a large data frame.
Use functions head(), tail(), or summary() to investigate a large data frame.
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
Length:150
Class :character
Mode :character
Pipes (|>) forward what is before them as first argument to a function after them.
This is typically useful to make the code of complex data wrangling compact and readable:
RMarkdown (.Rmd) file is a great way to record and share your analyses!
Include code and output in the same document.
Include plots to make a report.
Write plain text to keep notes.
‘Knit’ your notes to create a report.
‘Knit’ your notes to create a report.
Create a new RMarkdown file in RStudio (File > New File > RMarkdown).
Create a new chunk of R code:
Knit the document to html.
Check the document.
Does the code work properly? Can you work out why?
BONUS:
Search for the RMarkdown Cheatsheet online and try adding some headers and bold text.
Knit the document to PDF.